Co-training using prosodic and lexical information for sentence segmentation
نویسندگان
چکیده
We investigate the application of the co-training learning algorithm on the sentence boundary classification problem by using lexical and prosodic information. Co-training is a semisupervised machine learning algorithm that uses multiple weak classifiers with a relatively small amount of labeled data and incrementally uses unlabeled data. The assumption in cotraining is that the classifiers can co-train each other, as one can label samples that are difficult for the other. The sentence segmentation problem is very appropriate for the co-training method since it satisfies the main requirements of the cotraining algorithm: the dataset can be described by two disjoint and natural views that are redundantly sufficient. In our case, the feature sets are capturing lexical and prosodic information. The experimental results on the ICSI Meeting (MRDA) corpus show the effectiveness of the co-training algorithm for this task.
منابع مشابه
60 36 v 1 2 7 Ju n 20 00 Prosody - Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and...
متن کاملUsing Prosody for Automatic Sentence Segmentation of Multi-party Meetings
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We repor...
متن کاملFOR S ENTENCE U NIT S EGMENTATION FROM S PEECH Sébastien Cuendet
The sentence segmentation task is a classification task that aims at inserting sentence boundaries in a sequence of words. One of the applications of sentence segmentation is to detect the sentence boundaries in the sequence of words that is output by an automatic speech recognition system (ASR). The purpose of correctly finding the sentence boundaries in ASR transcriptions is to make it possib...
متن کاملIntegrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hiddenMarkov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We e...
متن کاملProsody Modeling for Automatic Speech Recognition and Understanding
This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automati...
متن کامل